Using Linguistic Principles to Recover Empty Categories

نویسنده

  • Richard Campbell
چکیده

This paper describes an algorithm for detecting empty nodes in the Penn Treebank (Marcus et al., 1993), finding their antecedents, and assigning them function tags, without access to lexical information such as valency. Unlike previous approaches to this task, the current method is not corpus-based, but rather makes use of the principles of early Government-Binding theory (Chomsky, 1981), the syntactic theory that underlies the annotation. Using the evaluation metric proposed by Johnson (2002), this approach outperforms previously published approaches on both detection of empty categories and antecedent identification, given either annotated input stripped of empty categories or the output of a parser. Some problems with this evaluation metric are noted and an alternative is proposed along with the results. The paper considers the reasons a principlebased approach to this problem should outperform corpus-based approaches, and speculates on the possibility of a hybrid approach.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Chasing the ghost: recovering empty categories in the Chinese Treebank

Empty categories represent an important source of information in syntactic parses annotated in the generative linguistic tradition, but empty category recovery has only started to receive serious attention until very recently, after substantial progress in statistical parsing. This paper describes a unified framework in recovering empty categories in the Chinese Treebank. Our results show that ...

متن کامل

Empty Categories in Hindi Dependency Treebank: Analysis and Recovery

In this paper, we first analyze and classify the empty categories in a Hindi dependency treebank and then identify various discovery procedures to automatically detect the existence of these categories in a sentence. For this we make use of lexical knowledge along with the parsed output from a constraint based parser. Through this work we show that it is possible to successfully discover certai...

متن کامل

Dealing with

This paper proposes how to extend Pappi, a principles and parameters parser which can currently parse ten languages based on the same core grammar, to handle nominalization constructions in Mandarin Chinese by deliberately revising the periphery file without affecting the other languages. There are four categories of nominalizations: relative clauses with a head noun as an adjunct, relative cla...

متن کامل

Freyd categories are Enriched Lawvere Theories

Lawvere theories provide a categorical formulation of the algebraic theories from universal algebra. Freyd categories are categorical models of first-order effectful programming languages. The notion of sound limit doctrine has been used to classify accessible categories. We provide a definition of Lawvere theory that is enriched in a closed category that is locally presentable with respect to ...

متن کامل

Designing a structured linguistic play therapy program for reading disorder: Basics and Strategies

Background & Purpose: Linguistic play therapy is a structured intervention based on the linguistic core of reading that can be modified and implemented for students with reading problems and disorders. The purpose of this study is to provide theoretical foundations and solutions and principles of linguistic game therapy design to empower teachers and counselors related to educational service...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004